Relationship extraction apparatus, relationship extraction method, and program

ABSTRACT

A relationship extraction device includes a memory; and a processor configured to execute obtaining a set of data {x0, . . . , xT−1}⊆X each having multiple elements and a set of data {y0=f(x0), . . . , yT−1=f(xT−1)}⊆Y each having multiple elements, where f is any mapping; generating an approximate operator that approximates a Perron-Frobenius operator K satisfying Kφ1(xt)=φ2(yt) for t=0, . . . , T−1, wherein φ1 is a feature mapping with respect to a positive definite kernel function k1 on X×X that takes C*-algebra values, and φ2 is a feature mapping with respect to a positive definite kernel function k2 on Y×Y that takes C*-algebra values; obtaining data xt and xs as targets of relationship extraction; and extracting a relationship between each element of xt and each element of xs by using the approximate operator.

TECHNICAL FIELD

The present invention relates to a relationship extraction device, amethod of extracting relationship, and a program.

BACKGROUND ART

For data having multiple elements, investigation of correlations betweenelements has been conducted in various technical fields (e.g., in thefields of the statistics, machine learning, molecular dynamics, etc.).

For example, in the field of statistics and machine learning, techniqueshave been proposed in which vectors having multiple elements of dataarranged are mapped to a space called vv-RKHS (vector-valued reproducingkernel Hilbert space), to approximate a function that represents arelationship between the elements on the vv-RKHS (Non-patent document1). As the vv-RKHS is a space of vector-valued functions, it has anadvantage of being capable of approximating the relationship amongmultiple elements at once. Note that techniques have been also proposedthat extract information on cyclic components from time series data thatrepresents change in time of the relationship by using the vv-RKHS fortime series data (Non-patent document 2).

Also, for example, in the field of physics and molecular dynamics,techniques have been proposed that extract information on collectiveoscillations by a method called phase reduction (Non-patent document 3).Also, for example, in the field of machine learning, methods have beenproposed that extract variables in a causality relationship by a methodcalled Granger causality (Non-patent document 4).

The vv-RKHS described above is a generalization of the RKHS (reproducingkernel Hilbert space) used for analyzing data having a single element.By using the RKHS, data exhibiting complex behavior can be convertedinto data exhibiting simple behavior. Using this property, techniqueshave been studied that approximate complex time series data with asimple function on the RKHS (Non-patent document 5).

Here, as another generalization of the RKHS, a space called RKHM(reproducing kernel Hilbert C*-module) has been proposed, andtheoretical analysis has been conducted in the field of physics(Non-patent document 6). The RKHM is a space of functions having valuesin a space called C*-algebra, and hence, can be used for approximating aC*-algebra-valued function. Note that C*-algebra is a generalization ofa set of all complex numbers and a set of all matrices, and is a spacehaving the concepts of conjugation and norm.

RELATED ART DOCUMENTS Non-Patent Document

-   [Non-patent document 1] Mauricio A. Alvarez, Lorenzo Rosasco, and    Neil D. Lawrence, ‘Kernels for vector-valued functions: a review,’    Computer Science and Artificial Intelligence Laboratory Technical    Report, MIT-CSAIL-TR-2011-033 CBCL-301, 2011.-   [Non-patent document 2] Keisuke Fujii, Yoshinobu Kawahara, ‘Dynamic    mode decomposition in vector-valued reproducing kernel Hilbert    spaces for extracting dynamical structure among observables,’ Neural    Networks 117, pp. 94-103, 2019.-   [Non-patent document 3] Hiroya Nakao, Sho Yasui, Masashi Ota,    Kensuke Arai and Yoji Kawamura, ‘Phase reduction and synchronization    of a network of coupled dynamical elements exhibiting collective    oscillations,’ Chaos 28, 045103, 2018.-   [Non-patent document 4] Songting Li, Yanyang Xiao, Douglas Zhou and    David Cai, ‘Causal inference in nonlinear systems: Granger causality    versus time-delayed mutual information,’ Phys. Rev. E 97, 052216,    2018.-   [Non-patent document 5] Yuka Hashimoto, Isao Ishikawa, Masahiro    Ikeda, Yoichi Matsuo and Yoshinobu Kawahara, ‘Krylov Subspace Method    for Nonlinear Dynamical Systems with Random Noise,’ arXiv:    1909.03634, 2019.-   [Non-patent document 6] Jaeseong Heo, ‘Reproducing kernel Hilbert    C*-modules and kernels associated with cocycles,’ J. Math. Phys. 49,    103507, 2008.

SUMMARY OF THE INVENTION Problem to be Solved by the Invention

Meanwhile, the RKHS is only capable of handling data having a singleelement, and hence, cannot describe a relationship among multipleelements. Also, the phase reduction aims at approximating collectivebehavior of data, and hence, cannot represent a relationship amongelements. On the other hand, although the vv-RKHS takes a relationshipamong multiple elements into consideration, the proximity betweenvector-valued functions included in the vv-RKHS is measured in complexvalues. Therefore, for example, in the case where the purpose is tocompletely extract information on the relationship of any two elementsfrom among the multiple elements, the number of relationships betweenthe two data items each having n elements becomes n², and hence, itbecomes necessary to represent the proximity of functions correspondingto these data items n² complex numbers.

In contrast, if using the RKHM, the proximity of functions can bemeasured by a C*-algebra value of a matrix or the like. However, thereis no framework of using the RKHM that aims at extracting relationshipsbetween elements of data having multiple elements.

One embodiment of the present invention was devised in view of the abovepoints, and has an object to extract relationships between elements heldin data using the RKHM.

Means for Solving Problem

As described above, in order to achieve the object, a relationshipextraction device according to one embodiment includes a first obtainingmeans configured to obtain a set of data {x₀, . . . , x_(T−1)}⊆X eachhaving multiple elements and a set of data {y₀=f(x₀), . . . ,y_(T−1)=f(x_(T−1))}⊆Y each having multiple elements, where f is anymapping; a generation means configured to generate an approximateoperator that approximates a Perron-Frobenius operator K satisfyingKφ₁(x_(t))=φ₂(y_(t)) for t=0, . . . , T−1, wherein φ₁ is a featuremapping with respect to a positive definite kernel function k₁ on X×Xthat takes C*-algebra values, and φ₂ is a feature mapping with respectto a positive definite kernel function k₂ on Y×Y that takes C*-algebravalues; a second obtaining means configured to obtain data x_(t) andx_(s) as targets of relationship extraction; and an extraction meansconfigured to extract a relationship between each element of x_(t) andeach element of x_(s) by using the approximate operator.

Advantageous Effects of the Invention

Relationships between elements held in data can be extracted using theRKHM.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram illustrating an example of a functionalconfiguration of a relationship extraction device according to a presentembodiment;

FIG. 2 is a flow chart illustrating an example of an approximateoperator generation process according to the present embodiment;

FIG. 3 is a flow chart illustrating an example of a relationshipextraction process according to the present embodiment;

FIG. 4A is a diagram (part 1) illustrating an example of an evaluationresult;

FIG. 4B is a diagram (part 1) illustrating an example of an evaluationresult;

FIG. 5A is a diagram (part 2) illustrating an example of an evaluationresult;

FIG. 5B is a diagram (part 2) illustrating an example of an evaluationresult;

FIG. 5C is a diagram (part 2) illustrating an example of an evaluationresult;

FIG. 5D is a diagram (part 2) illustrating an example of an evaluationresult;

FIG. 6A is a diagram (part 3) illustrating an example of an evaluationresult;

FIG. 6B is a diagram (part 3) illustrating an example of an evaluationresult;

FIG. 7A is a diagram (part 4) illustrating an example of an evaluationresult;

FIG. 7B is a diagram (part 4) illustrating an example of an evaluationresult; and

FIG. 8 is a diagram illustrating an example of a hardware configurationof a relationship extraction device according to the present embodiment.

EMBODIMENTS FOR CARRYING OUT THE INVENTION

In the following, one embodiment of the present invention will bedescribed. In the present embodiment, a relationship extraction device10 will be described, that can extract a relationship (i.e.,interrelation) between any two elements held in data when the dataincluding one or more items each having multiple elements is given, byusing an RKHM.

<Theoretical Construction>

First, a theoretical construction of the present embodiment will bedescribed.

<<Settings>>

Let X be a space to which data having multiple elements belongs, and let{x₀, x₁, . . . }⊆X be a set of given data. Let A be a C*-algebra, toconsider an A-valued positive definite kernel k:X×X→A. Here, whenstating that a mapping k:X×X→A is an A-valued positive definite kernel,the mapping satisfies the following Condition 1 and Condition 2.

(Condition 1) For any x,y∈X, k(x,y)=k(x,y)* where * denotes conjugate.(Condition 2) Let m be any natural number, for any x₀, x₁, . . . ,x_(m-1)∈X and any c₀, c₁, . . . c_(m-1)∈A, the following doublesummation is positive.

$\begin{matrix}\left\lbrack {{Math}.1} \right\rbrack &  \\{\sum\limits_{t = 0}^{m - 1}{\sum\limits_{s = 0}^{m - 1}{c_{t}^{*}{k\left( {x_{t},x_{s}} \right)}c_{s}}}} & \end{matrix}$

Here, “positive” means being positive constant in a C*-algebra, which isa generalization of a Hermitian matrix whose all eigenvalues are greaterthan or equal to 0 (i.e., Hermitian positive definite).

Given an A-valued positive definite kernel k, a mapping φ from X to anA-valued function is defined by φ(x)=k(⋅,x). This mapping (p is alsoreferred to as a feature map.

For a natural number m, x₀, x₁, . . . , x_(m-1)∈X, and c₀, c₁, . . . ,c_(m-1)∈A, let M_(k,0) be a space configured from the entirety of thefollowing linear combination.

$\begin{matrix}\left\lbrack {{Math}.2} \right\rbrack &  \\{\sum\limits_{t = 0}^{m - 1}{{\phi\left( x_{t} \right)}c_{t}}} & \end{matrix}$

Also, let m and m′ be natural numbers, x₀, x₁, . . . , x_(m-1), y₀, y₁,. . . , y_(m′-1)∈X, and c₀, c₁, . . . , c_(m-1), d₀, d₁, . . . ,d_(m′-1)∈A, an operation <⋅, ⋅>_(k) with respect to M_(k,0) is definedas follow:

$\begin{matrix}\left\lbrack {{Math}.3} \right\rbrack &  \\{\left\langle {{\sum\limits_{t = 0}^{m - 1}{{\phi\left( x_{t} \right)}c_{t}}},{\sum\limits_{t = 0}^{m^{\prime} - 1}{{\phi\left( y_{t} \right)}d_{t}}}} \right\rangle_{k} = {\sum\limits_{t = 0}^{m - 1}{\sum\limits_{s = 0}^{m^{\prime} - 1}{c_{t}^{*}{k\left( {x_{t},y_{s}} \right)}d_{s}}}}} & \end{matrix}$

The operation <⋅,⋅>_(k) defined in this way has the properties of theA-valued inner product. In other words, the operation has the followingfour properties with respect to u,v,w∈M_(k,0) and c,d∈A:

-   -   <u,v>_(k)=<v,u>_(k)*    -   <u,u>_(k) is positive    -   <u,u>_(k)=0 is equivalent to u=0    -   <u,vc+wd>_(k)=<u,v>_(k)c+<u,w>_(k)d

By using this inner product <⋅,⋅>_(k), a complex-valued norm can bedefined as follows:

$\begin{matrix}\left\lbrack {{Math}.4} \right\rbrack &  \\{{v}_{k} = {\left\langle {v,v} \right\rangle_{k}}^{\frac{1}{2}}} & \end{matrix}$

A space in which M_(k,0) is completed with respect to this norm isdenoted as M_(k), and is referred to as a reproducing kernel HilbertC*-module (RKHM) with respect to k. M_(k) can be configured uniquely.Also, in M_(k), the magnitude of an A value |⋅|_(k) can also be definedas follows:

$\begin{matrix}\left\lbrack {{Math}.5} \right\rbrack &  \\{{❘v❘}_{k} = \left\langle {v,v} \right\rangle_{k}^{\frac{1}{2}}} & \end{matrix}$

Assuming that each of x_(t) (t=0, 1, . . . ) being an element of X has nelements, x_(t) is denoted as x_(t)=[x_(t,0), . . . , x_(t,n-1)]. In thecase where a C*-algebra A is the entirety of n×n matrices, an A-valuedpositive definite kernel k can be configured by using the followingcomplex-valued positive definite kernel:

{tilde over (k)}  [Math. 6]

k

Note that in the text of the present description, for the sake ofconvenience, a symbol having “˜” added to the top of x is written as“˜x”.

In fact, if each (i,j) component of an n×n matrix k(x_(t),x_(s)) isdefined by ˜k(x_(t,i),x_(s,j)) with respect to the elements x_(t) andx_(s) of X, it can be shown that k is an n×n matrix-valued positivedefinite kernel. ˜k(x_(t,i),x_(s,j)) represents the proximity of x_(t,i)and x_(s,j); therefore, each (i,j) component of k(x_(t),x_(s)) (i.e.,the inner product of φ(x_(t)) and φ(x_(s))) represents the proximity ofthe i-th component x_(t,i) of x_(t) and the j-th component x_(s,i) ofx_(s).

<<Relationship of Data in RKHM>>

Let X and Y be spaces to which data belongs, and assume that thefollowing Formula (1) holds for x₀, x₁, . . . , x_(T−1)∈X and y₀, y₁ . .. , y_(T−1)∈Y.

y _(t) =f(x _(t))  (1)

where f is a mapping from X to Y that is nonlinear in general.

Let k₁ be a positive definite kernel on X, let k₂ be a positive definitekernel on Y, let φ₁ be a feature map with respect to k₁, and let φ₂ be afeature map with respect to k₂. In order to express Formula (1)described above as a formula in the following spaces,

M _(k) ₁ ,M _(k) ₂   [Math. 7]

Assume that the following mapping,

K:M _(k) ₁ →M _(k) ₂   [Math. 8]

satisfies the following Formula (2),

Kφ ₁(x _(t))=φ₂(y _(t))  (2)

Such K is referred to as a Perron-Frobenius operator. In the case wherex₀, x₁, . . . , x_(T−1)∈X constitute time series data, if setting X=Y,y_(t)=x_(t+1), and k₁=k₂, then, f is a mapping representing timeevolution, and thereby, K is also a mapping representing time evolution.

<<Approximation of Perron-Frobenius Operator by Orthonormal Projection>>

In the following, it is assumed that an element x_(t) (t=0, 1, . . . )of X has n elements, and is expressed as x_(t)=[x_(t,0), . . . ,x_(t,n-1)]. Also, assume that a C*-algebra A is the entirety of n×nmatrices, and as described above, an A-valued positive definite kernel kis configured using a complex-valued positive definite kernel ˜k.

At this time, consider approximating K that satisfies Formula (2)describe above, to analyze f by using the approximated K, predict y_(t)from a given x_(t), and compare matrix-valued inner products of elementsof X obtained by such analysis and prediction (i.e., measure theproximity). The value of an inner product (proximity) takes a matrixvalue, and its component represents the proximity of elements, andthereby, relationships between the elements can be extracted. In thefollowing, (i) a case of applying a Perron-Frobenius operator K whenX=Y, y_(t)=x_(t+1), and k₁=k₂=k; and (ii) a case of applying thePerron-Frobenius operator K when X*Y, will be described.

(i) The case of X=Y, y_(t)=x_(t+1), and k₁=k₂=k

In this case, by solving a minimization problem in Formula (3) shownlater,

{circumflex over (K)}  [Math. 9]

is solved to approximate K. Note that in the text of the presentdescription, for the sake of convenience, a symbol having “{circumflexover ( )}” added to the top of x is written as “{circumflex over ( )}x”.

$\begin{matrix}\left\lbrack {{Math}.10} \right\rbrack &  \\{\min\limits_{{{\phi(x_{t + 1})} = {\hat{K}{\phi(x_{t})}{({{t = 0},\ldots,{T - 2}})}}},{\hat{K} \in {L(V_{T})}}}{❘{{\phi\left( x_{T} \right)} - {\hat{K}{\phi\left( X_{T - 1} \right)}}}❘}_{k}} & (3)\end{matrix}$

where V_(T) is a set of all linear combinations expressed as in thefollowing formula:

$\begin{matrix}\left\lbrack {{Math}.11} \right\rbrack &  \\{\sum\limits_{t = 0}^{T - 1}{{\phi\left( x_{t} \right)}{c_{t}\left( {c_{t} \in A} \right)}}} & \end{matrix}$

Also, L(V_(T)) is a set of all A-linear operators from V_(T) to V_(T)(i.e., L that satisfies L(vc)=(Lv)c for any c∈A and any v∈M_(k)).

In order to solve the minimization problem shown in Formula (3)described above, an orthonormal projection from M_(k) to V_(T) iscalculated. Here, an orthonormal projection P from M_(k) to V_(T) is anA-linear operator from M_(k) to V_(T) that satisfies P²=P and P=P*. Pcan be calculated by configuring an orthonormal system {q₀, q₁, . . . ,q_(T−1)} of V_(T). The orthonormal system {q₀, q₁, . . . , q_(T−1)} ofV_(T) is a Hermitian matrix c where <q_(t),q_(s)>_(k)=0 and<q_(t),q_(t)>_(k) is not 0 for q_(t)∈V_(T) and s≠t, and c²=c issatisfied (in this case, q_(t) is called normal).

Given time series data x₀, x₁, . . . , x_(T−1)∈X, an orthonormal system{q₀, q₁, . . . , q_(T−1)} of V_(T) can be configured by sequentiallyexecuting the following Step 1 and Step 2 for t=0, 1, . . . ,T−1.

Step 1: If t=0, set ˜q₀=φ(x₀). On the other hand if t≠0, for s=0, . . ., t−1, set r_(s,t)=<φ(x_(t)), q_(s)>_(k), and set ˜q_(t) as follows:

$\begin{matrix}\left\lbrack {{Math}.12} \right\rbrack &  \\{{\overset{\sim}{q}}_{t} = {{\phi\left( x_{t} \right)} - {\sum\limits_{s = 0}^{t - 1}{r_{s,t}q_{s}}}}} & \end{matrix}$

Step 2: Next, let ε be a real number greater than or equal to 0, and if∥˜q_(t)∥_(k)≥ε, set q_(t)=0; otherwise, the following is executed. Leteigenvalues of <˜q_(t), ˜q_(t)>_(k) be λ_(t,0)≥ . . . ≥λ_(t,n-1), andlet m_(t) be the maximum index that satisfies λ_(i)>ε². Also, letU_(t)D_(t)U_(t)* be the eigendecomposition of <˜q_(t),˜q_(s)>_(k). Here,D_(t) is a matrix having diagonal components of λ_(t,0), . . . ,λ_(t,n-1) and non-diagonal components of all zero. U_(t) is a matrix inwhich eigenvectors corresponding to the respective eigenvalues λ_(t,n-1)are arranged in this order. At this time, <˜q_(t), ˜q_(t)>_(k) is aHermitian positive definite matrix, and hence, if ∥˜q_(t)∥_(k)>ε, has atleast one positive eigenvalue greater than ε, and m_(t)>0.

Therefore, let {circumflex over ( )}D_(t) be a matrix having thefollowing diagonal components,

$\begin{matrix}\left\lbrack {{Math}.13} \right\rbrack &  \\{\frac{1}{\sqrt{\lambda_{t,0}}},\ldots,\frac{1}{\sqrt{\lambda_{t,m_{t}}}},0,\ldots,0} & \end{matrix}$

and having non-diagonal components of all zero, and setb_(t)=U_(t){circumflex over ( )}D_(t)U_(t). Further, setq_(t)=˜q_(t)b_(t). q_(t) is normal, and hence, is an orthonormal vector.

Let Φ_(T) be an A-linear mapping to map a vector [c₀, . . . , c_(T−1)]of arrayed T elements of the C*-algebra A to the following linearcombination:

$\begin{matrix}\left\lbrack {{Math}.14} \right\rbrack &  \\{\sum\limits_{t = 0}^{T - 1}{q_{t}c_{t}}} & \end{matrix}$

Also, let B_(T) be a matrix having diagonal components of b₀, . . . ,b_(T−1) and non-diagonal components of all zero, and let R_(T) be a T×Tmatrix having r_(s,t) as the (s,t) component. Note that each componentof R_(T) is an element of A. By executing Step1 and Step2 describedabove, it can be shown that Q_(T)=Φ_(T)B_(T)−Q_(T)R_(T); therefore,Q_(T)=Φ_(T)B_(T)(I+R_(T))⁻¹ where I is an identity matrix.

For Q_(T) configured as described above, Q_(T)Q_(T)* is an orthonormalprojection from M_(k) to ˜V_(T) (i.e., if setting P=Q_(T)Q_(T)*, P is anorthonormal projection) where ˜V_(T) is a set of all linear combinationsexpressed as in the following formula:

$\begin{matrix}\left\lbrack {{Math}.15} \right\rbrack &  \\{\sum\limits_{t = 0}^{T - 1}{q_{t}{c_{t}\left( {c_{t} \in A} \right)}}} & \end{matrix}$

The orthonormal projection minimizes the difference, i.e., for anyelement v of M_(k) and any element w of ˜V_(T), |v−w|_(k)−|v−Pv|_(k) ispositive. Also, in the case of setting c=0 at Step 2 described above,any element v of V_(T) can be expressed as follows:

$\begin{matrix}\left\lbrack {{Math}.16} \right\rbrack &  \\{v = {\sum\limits_{t = 0}^{T - 1}{q_{t}c_{t}}}} & \end{matrix}$

Therefore, it can be shown V_(T)=˜V_(T). Here, c_(t) is an n×n matrix.

Therefore, it can be understood that {circumflex over ( )}K fulfillingFormula (3) described above satisfies {circumflex over( )}Kφ(x_(T−1))=Q_(T)Q_(T)*φ(x_(T)), and satisfies {circumflex over( )}Kφ(x_(t))=Q_(T)Q_(T)*φ(x_(t+1)) (t=0, . . . , T−1). Meanwhile, foran element v of M_(k), an element of V_(T) that minimizes the differenceis Q_(T)Q_(T)*v. Therefore, Kv is approximated with {circumflex over( )}KQ_(T)Q_(T)*v. Here, {circumflex over ( )}KQ_(T)Q_(T)*v can beexpressed as follows:

{circumflex over (K)}Q _(T) Q _(T) *v={circumflex over (K)}Φ _(T) B_(T)(I+R _(T))⁻¹ Q* _(T) v=Q _(T) Q* _(T)Φ_(T+1) B _(T)(I+R _(T))⁻¹ Q*_(T) v=Q _(T)(I+R _(T))⁻ *B* _(T)Φ*_(T)Φ_(T+1) B _(T)(I+R _(T))⁻¹ Q*_(T) v  [Math. 17]

where, −* denotes the Hermitian transposition of an inverse matrix.

By Q_(T), a vector of arrayed T elements of A and an element of V_(T)can be considered as identical, and hence, Q_(T) can be regarded as anoperator representing a coordinate transformation. Therefore, K isapproximated with a T×T matrix in which components expressed as in thefollowing formula,

(I+R _(T))⁻ *B* _(T)Φ*_(T)Φ_(T+1) B _(T)(I+R _(T))⁻¹  [Math. 18]

are elements of A. Φ_(T)*Φ_(T)+1 is a T×T matrix whose (s, t) componentis k (x_(s), x_(t+1))∈A, and hence, the formula,

(I+R _(T))⁻ *B* _(T)Φ*_(T)Φ_(T+1) B _(T)(I+R _(T))⁻¹  Math. 19]

can be calculated in practice. Therefore, ˜K_(T) is set as follows:

{tilde over (K)} _(T)=(I+R _(T))⁻ *B* _(T)Φ*_(T)Φ_(T+1) B _(T)(I+R_(T))⁻¹  [Math. 20]

and this ˜K_(T) is referred to as an “approximate Perron-Frobeniusoperator”.

Thus, by using this approximate Perron-Frobenius operator ˜K_(T), Kv canbe approximated as Q_(T)˜K_(T)Q_(T)*v for any v∈M_(k).

(ii) The Case of X≠Y

In this case, let V_(T) be a set of all linear combinations expressed asin the following formula,

$\begin{matrix}\left\lbrack {{Math}.21} \right\rbrack &  \\{\sum\limits_{t = 0}^{T - 1}{{\phi_{1}\left( x_{t} \right)}{c_{t}\left( {c_{t} \in A} \right)}}} & \end{matrix}$

and let W_(T) be a set of all linear combinations expressed as in thefollowing formula:

$\begin{matrix}\left\lbrack {{Math}.22} \right\rbrack &  \\{\sum\limits_{t = 0}^{T - 1}{{\phi_{2}\left( y_{t} \right)}{c_{t}\left( {c_{t} \in A} \right)}}} & \end{matrix}$

Further, in substantially the same way as in (i) described above, anorthonormal system {q₀, q₁, . . . , q_(T−1)} of VT is configured, and byusing this orthonormal system {q₀, q₁, . . . , q_(T−1)}, Q_(T) isconfigured. Further, let {circumflex over ( )}K be a linear mapping fromV_(T) to W_(T) that satisfies {circumflex over ( )}Kφ₁(x_(t))=φ₂(y_(t)),to approximate Kv with {circumflex over ( )}KQ_(T)Q_(T)*v. Therefore,also for W_(T), an orthonormal system is configured in substantially thesame way as in (i) described above, and by using this orthonormalsystem, P_(T) is configured by a method substantially the same as themethod of configuring Q_(T) described above.

Also, in substantially the same way as in (i) described above, Q_(T) isdecomposed as Q_(T)=Φ_(T)B_(T)(I+R_(T))−1 where Φ_(T) is an A-linearmapping that maps a vector [c₀, . . . , c_(T−1)] of arrayed T elementsof A to the following linear combination:

$\begin{matrix}\left\lbrack {{Math}.23} \right\rbrack &  \\{\sum\limits_{t = 0}^{T - 1}{{\phi_{1}\left( x_{t} \right)}c_{t}}} & \text{ }\end{matrix}$

In substantially the same way, P_(T) is decomposed asP_(T)=Ψ_(T)C_(T)(I+S_(T))⁻¹ where Ψ_(T) is an A-linear mapping that mapsa vector [c₀, . . . , c_(T−1)] of arrayed T elements of A to thefollowing linear combination:

$\begin{matrix}\left\lbrack {{Math}.24} \right\rbrack &  \\{\sum\limits_{t = 0}^{T - 1}{{\phi_{2}\left( y_{t} \right)}c_{t}}} & \text{ }\end{matrix}$

Also, C_(T) is a T×T matrix with respect to W_(T), configured by amethod substantially the same as the method of configuring B_(T)described above. Similarly, S_(T) is a T×T matrix with respect to W_(T),configured by a method substantially the same as the method ofconfiguring R_(T) described above.

At this time, as {circumflex over ( )}Kφ₁(x_(t))=φ₂(y_(t)) is satisfied,{circumflex over ( )}KΦ_(T)=ωT is derived; therefore, K is approximatedwith a T×T matrix that has components of elements of A, and is expressedas follows:

P* _(T) {circumflex over (K)}Q _(T)=(I+S _(T))−*C* _(T)Ψ*_(T)Ψ_(T) B_(T)(I+R _(T))⁻¹  [Math. 25]

In other words, the approximate Perron-Frobenius operator is set asfollows:

{tilde over (K)} _(T)=(I+S _(T))⁻ *C* _(T)Ψ*_(T)Ψ_(T) B _(T)(I+R_(T))⁻¹  [Math. 26]

<<Decomposition of Approximate Perron-Frobenius Operator>>

As described above, an A-valued positive definite kernel k is configuredwith a complex-valued positive definite kernel ˜k, and is an n×n matrixin which each component takes a complex value. Therefore, letting C bethe complex number field, the approximate Perron-Frobenius operator˜K_(T) can be regarded as ˜K_(T)∈C^(nT×nT).

Here, assume that there exist eigenvalues ˜λ₀, . . . , ˜λ_(nT-1) andcorresponding eigenvectors ˜v₀, . . . , ˜v_(nT-1) for the approximatePerron-Frobenius operator ˜K_(T). By setting v_(m)=[˜v_(m), 0, . . . ,0] and λ_(m)=diag{˜λ_(m), 0, . . . , 0}, ˜K_(T)v_(m)=v_(m)λ_(m) issatisfied. Also, if [˜v₁, . . . , ˜v_(nT-1)] is invertible, thefollowing formula holds:

$\begin{matrix}\left\lbrack {{Math}.27} \right\rbrack &  \\{{Q_{T}^{*}{\phi\left( x_{0} \right)}} = {\sum\limits_{m = 0}^{{nT} - 1}{v_{m}{c_{m}\left( {c_{m} \in A} \right)}}}} & \text{ }\end{matrix}$

Here, by the definition of K, φ(x_(t))=K^(t)φ(x₀) holds. Therefore, byusing an approximate Perron-Frobenius operator, φ(x_(t)) is approximatedwith Q_(T)˜K_(T)tQ_(T)*φ(x₀). Similarly, by using an approximatePerron-Frobenius operator, φ(x_(s)) is approximated with Q_(T)˜K_(T)^(s)Q_(T)*φ(x₀).

Then, k (x_(t), x_(s))=<φ(x_(t)), φ(x_(s))>_(k) can be approximated asin the following Formula (4):

$\begin{matrix}\left\lbrack {{Math}.28} \right\rbrack &  \\{\left\langle {{\phi\left( x_{t} \right)},{\phi\left( x_{s} \right)}} \right\rangle_{k} \approx \left\langle {{Q_{T}{\overset{\sim}{K}}_{T}^{t}{\sum\limits_{m = 0}^{{nT} - 1}{v_{m}c_{m}}}},{Q_{T}{\overset{\sim}{K}}_{T}^{s}{\sum\limits_{m = 0}^{{nT} - 1}{v_{m}c_{m}}}}} \right\rangle_{k}} & (4)\end{matrix}$$= \left\langle {{{\overset{\sim}{K}}_{T}^{t}{\sum\limits_{m = 0}^{{nT} - 1}{v_{m}c_{m}}}},{{\overset{\sim}{K}}_{T}^{s}{\sum\limits_{m = 0}^{{nT} - 1}{v_{m}c_{m}}}}} \right\rangle$$= \left\langle {{\sum\limits_{m = 0}^{{nT} - 1}{v_{m}\lambda_{m}^{t}c_{m}}},{\sum\limits_{m = 0}^{{nT} - 1}{v_{m}\lambda_{m}^{s}c_{m}}}} \right\rangle$$= {\sum\limits_{m,{m^{\prime} = 0}}^{{nT} - 1}{{c_{m}^{*}\left( \lambda_{m}^{*} \right)}^{t}\left\langle {v_{m},v_{m^{\prime}}} \right\rangle\lambda_{m^{\prime}}^{s}c_{m^{\prime}}}}$$= {\sum\limits_{m,{m^{\prime} = 0}}^{{nT} - 1}{{\overset{\sim}{\overset{\sim}{\lambda}}}_{m}^{t}{{\overset{\sim}{\lambda}}_{m^{\prime}}^{s}\left( {{\overset{\sim}{v}}_{m}^{*}{\overset{\sim}{v}}_{m^{\prime}}} \right)}c_{m}^{*}c_{m^{\prime}}}}$

where for u_(m) and v_(m), <u_(m), v_(m)≥u_(m)*v_(m).

By approximation and decomposition executed in this way, for example, itbecomes possible to analyze the behavior when having s,t→∞; the cycle ofchange in k(x_(t),x_(s)) (i.e., a matrix in which each (i,j) componentrepresents the proximity between the i-th component of x_(t) and thej-th component of x_(s)); and the like.

<Functional Configuration of Relationship Extraction Device 10>

Next, a functional configuration of the relationship extraction device10 according to the present embodiment will be described with referenceto FIG. 1 . FIG. 1 is a diagram illustrating an example of a functionalconfiguration of the relationship extraction device 10 according to thepresent embodiment.

As illustrated in FIG. 1 , the relationship extraction device 10according to the present embodiment includes an approximate operatorgeneration processing unit 100, a relationship extraction processingunit 200, and a storage unit 300.

The storage unit 300 stores a set of data {x₀, x₁, . . . ,x_(T−1)} eachhaving multiple elements. Also, in the storage unit 300, an approximatePerron-Frobenius operator ˜K_(T) generated by the approximate operatorgeneration processing unit 100, and relationships extracted by therelationship extraction processing unit 200 are stored (i.e., an n×nmatrix as an approximation result shown in Formula (4) described above).

The approximate operator generation processing unit 100 takes as input aset of data {x₀, x₁, . . . , x_(T−1)} each having multiple elements, andexecutes an approximate operator generation process of generating anapproximate Perron-Frobenius operator ˜K_(T). Here, the approximateoperator generation processing unit 100 includes an obtaining unit 101and an approximate operator generation unit 102. The obtaining unit 101obtains the set of data {x₀, x₁, . . . , x_(T−1)} each having multipleelements from the storage unit 300. The approximate operator generationunit 102 generates an approximate Perron-Frobenius operator ˜K_(T) from{x₀, x₁, . . . , x_(T−1)} obtained by the obtaining unit 101.

The relationship extraction processing unit 200 takes as input datax_(s) and x_(t) as targets of relationship extraction, and executes arelationship extraction process to extract relationships between thedata. Here, the relationship extraction processing unit 200 includes anobtaining unit 201 and a relationship extraction unit 202.

The obtaining unit 201 obtains the data x_(s) and x_(t) as targets ofrelationship extraction from the storage unit 300. The relationshipextraction unit 202 extracts relationships between the obtained x_(s)and x_(t) by the obtaining unit 201.

Note that the configuration of the relationship extraction device 10illustrated in FIG. 1 is an example, and another configuration may beadopted. For example, the approximate operator generation processingunit 100 and the relationship extraction processing unit 200 may beincluded in different devices or equipment.

<Approximate Operator Generation Process>

Next, an approximate operator generation process according to thepresent embodiment will be described with reference to FIG. 2 . FIG. 2is a flow chart illustrating an example of an approximate operatorgeneration process according to the present embodiment.

The obtaining unit 101 of the approximate operator generation processingunit 100 obtains data each having multiple elements from the storageunit 300 (also obtains y₀, y₁, . . . , y_(T−1) in the case of (ii)described above) (Step S101).

Next, the approximate operator generation unit 102 of the approximateoperator generation processing unit 100 sets t←0 where t is an indexindicating the data obtained at Step S101 described above (Step S102).

Next, the approximate operator generation unit 102 of the approximateoperator generation processing unit 100 generates an orthonormal vectorq_(t) by using φ(x₀), . . . , φ(x_(t)) as described in the above (i) and(ii) (φ₁(x₀), . . . , φ₁(x_(t)) in the case of (ii) described above)(Step S103). Note that an orthonormal vector of W_(T) is also generatedin the case of (ii) described above.

Next, the approximate operator generation unit 102 of the approximateoperator generation processing unit 100 sets t←t+1 (Step S104). Further,the approximate operator generation unit 102 of the approximate operatorgeneration processing unit 100 determines whether t<T (Step S105).

If t<T is determined at Step S105 described above, the approximateoperator generation unit 102 of the approximate operator generationprocessing unit 100 returns to Step S103. Thus, for t=0, . . . , T−1,Step S103 described above is executed, and the orthonormal system {q₀,q₁, . . . , q_(T−1)} is obtained. Note that in the case of (ii)described above, the orthonormal system of W_(T) is also obtained.

If t<T is not determined at Step S105 described above, the approximateoperator generation unit 102 of the approximate operator generationprocessing unit 100 generates an approximate Perron-Frobenius operator˜K_(T) by using the orthonormal system {q₀, q₁, . . . , q_(T−1)} asdescribed in the above (i) and (ii) (also using the orthonormal systemof W_(T) in the case of (ii)) (Step S106).

<Relationship Extraction Process>

Next, a relationship extraction process according to the presentembodiment will be described with reference to FIG. 3 . FIG. 3 is a flowchart illustrating an example of a relationship extraction processaccording to the present embodiment.

The obtaining unit 201 of the relationship extraction processing unit200 obtains the data x_(s) and x_(t) as targets of relationshipextraction from the storage unit 300 (Step S201).

Next, the relationship extraction unit 202 of the relationshipextraction processing unit 200 extracts relationships between theobtained x_(s) and x_(t) obtained at Step S101 described above (StepS102). In other words, the relationship extraction unit 202 approximatesk(x_(t),x_(s))=<φ(x_(t)), φ(x_(s))>_(k) by Formula (4) described above(Step S202). Accordingly, an n×n matrix is obtained in which each (i,j)component represents the proximity between the i-th component of x_(t)and the j-th component of x_(s) (i.e., the relationship between x_(t,i)and x_(s,j)), and the relationships between x_(s) and x_(t) areextracted.

Application Examples

In the following, several application examples using the approximatePerron-Frobenius operator will be described.

<<Anomaly Detection>>

Suppose that each of x₀, x₁, . . . , x_(T−1) ∈X has n items of timeseries data. In other words, suppose that x_(t) includes x_(t,0), . . ., x_(t,n-1) as n items of time series data, denoted as x_(t)=[x_(t,0), .. . , x_(t,n-1)]. In the case where φ(x_(t)) has been obtained,φ(x_(t+1)) can be predicted by using an approximate Perron-Frobeniusoperator ˜K_(T) obtained by the method described in (i) described above.This prediction can be obtained by Q_(T)˜K_(T)Q_(T)*φ(x_(t)) asdescribed above.

At this time, assuming that the following equation holds,

$\begin{matrix}\left\lbrack {{Math}.29} \right\rbrack &  \\{{{\overset{\sim}{K}}_{T}Q_{T}^{*}{\phi\left( x_{t} \right)}} = {\sum\limits_{s = 0}^{T - 1}{{\phi\left( x_{s} \right)}c_{s}}}} & \text{ }\end{matrix}$

Each (j,j) component of the following formula,

$\begin{matrix}\left\lbrack {{Math}.30} \right\rbrack &  \\{❘{{\sum\limits_{s = 0}^{T - 1}{{\phi\left( x_{s} \right)}c_{s}}} - {\phi\left( x_{t} \right)}}❘}_{k}^{2} & \text{ }\end{matrix}$

is equivalent to the following:

$\begin{matrix}\left\lbrack {{Math}.31} \right\rbrack &  \\{{{\sum\limits_{s = 0}^{T - 1}{\sum\limits_{i = 0}^{n - 1}{\left( c_{s} \right)_{i,j}{\overset{\sim}{\phi}\left( x_{s,i} \right)}}}} - {\overset{\sim}{\phi}\left( x_{t,i} \right)}}}_{\overset{\sim}{k}} & \text{ }\end{matrix}$

where ˜φ is a feature map with respect to ˜k, and (c_(s))_(i,j) is the(i,j) component of c_(s) being an n×n matrix.

Therefore, in the case where the (j,j) component of the followingformula is large,

$\begin{matrix}\left\lbrack {{Math}.32} \right\rbrack &  \\{❘{{\sum\limits_{s = 0}^{T - 1}{{\phi\left( x_{s} \right)}c_{s}}} - {\phi\left( x_{t} \right)}}❘}_{k}^{2} & \text{ }\end{matrix}$

it can be understood that an anomaly occurs in the j-th data item in then items of time series data.

<<Causal Estimation (Part 1)>>

For n items of time series data, x₀, x₁, . . . , x_(T−1)∈X are definedsuch that x_(s,i+m″n) is data at time s+m″ of the i-th item of the timeseries data. At this time, consider the case of t=s in Formula (4)described above. For ˜λ_(m) having a magnitude close to 1,

{tilde over (λ)} _(m) ^(s) {tilde over (λ)} _(m) ^(s)({tilde over (v)}*_(m) {tilde over (v)} _(m))c* _(m) c _(m)  [Math. 33]

is unchanged by the change in s; therefore, for ˜λ_(m) having themagnitude close to 1, in the sum of Formula (4) described above, bycalculating only the following,

{tilde over (λ)} _(m) ^(s) {tilde over (λ)} _(m) ^(s)({tilde over (v)}*_(m) {tilde over (v)} _(m)(c* _(m) c _(m)  [Math. 34]

An unchanged part regardless of the change in s in the approximation ofk(x_(s),x_(s)) (the proximity between x_(s) and x_(s)) can be extracted.Therefore, if the value of the (i,j+m″n) component of the sum is large,then, x_(s,i) and x_(s,j+m″n) are close regardless of s; conversely, ifthe value of the component (i,j+m″n) is small, then x_(s,i) andx_(s,j+m″n) are distant regardless of s. In other words, it can beunderstood that the change in the i-th data from among the n items oftime series data is a cause of the change in the j-th data.

<<Causal Estimation (Part 2)>>

Suppose that each of x₀, x₁, . . . , x_(T−1)∈X has n items of timeseries data. In the case where the change in j-th data from among the nitems of time series data is a cause of the change in i-th data,consider data ˜x₀, ˜x₁, . . . , ˜x_(T−1) each obtained by removing thej-th component in x_(t) (t=0, . . . , T−1). In other words, it is set as˜x_(t)=[x_(t,0), . . . , x_(t,j-1), x_(t,j+1), . . . , x_(t,n−1)]

At this time, in the case of considering that ˜K_(T) is generated with˜x₀, ˜x₁, . . . , ˜x_(T−1) to predict ˜x_(s) for S≥T, this prediction iscalculated by Q_(T)˜K_(T)Q_(T)*φ(˜x_(s-1)); however, among thecomponents of ˜x_(s), it is expected that the component corresponding tothe i-th data is not approximated well. Therefore, by comparing thecomponents of the following formula,

|Q _(T) {tilde over (K)} _(T) Q* _(T)ϕ({tilde over (x)} _(S-1))−ϕ({tildeover (x)} _(S))|_(k) ²  [Math. 35]

data that changes due to the change in the j-th data as the cause can beidentified. In other words, in the case where the (i,i) component of thefollowing formula is large,

|Q _(T) {tilde over (K)} _(T) Q* _(T)ϕ({tilde over (x)} _(S-1))−ϕ({tildeover (x)} _(S))|_(k) ²  [Math. 36]

it can be understood that the change in the j-th data is the cause ofthe change in the i-th data. In the Granger causality, a linearrelationship is assumed between items of data in time series data,whereas the method according to the present embodiment can estimate withgood precision even for a nonlinear relationship.<<Behavior of Proximity Between Elements when t→∞>>

In Formula (4) described above, the term corresponding to ˜λ_(m)=1becomes a constant value when t→∞, and a term corresponding to|˜λ_(m)|<1 becomes zero when t→∞. Therefore, in Formula (4) describedabove, by setting ˜λ_(m) not being 1 to 0, the behavior of the proximitybetween elements when t→∞ can be understood.

<Other Data Analysis Methods Using RKHM>

Kernel PCA will be described as one of modified examples of the presentembodiment. Let x₀, x₁, . . . , x_(T−1)∈X be data each having nelements. By using the same notations as used in Step 2 described above,˜b_(t) is defined as ˜b_(t)=U_(t)D_(t)U_(t) where D_(t) is a matrixhaving diagonal components of

√{square root over (λ_(t,0))}, . . . ,√{square root over (λ_(t,m) _(t))},0, . . . ,0  [Math. 37]

and having non-diagonal components of all zero. Let ˜B_(m) be a matrixhaving diagonal components of ˜b₀, . . . , ˜b_(m-1) and non-diagonalcomponents of all zero. In the case of setting ε=0,Φ_(m)=Q_(m)(˜B_(m)+R_(m)) holds. Also, it can be shown that Cm thatsatisfies Q_(m)*Q_(m)=C_(m)C_(m)* exists. Therefore, by calculating thesingular value decomposition as C_(m)*R_(m)=U_(m)Σ_(m)V_(m)*, andsetting w₁=Q_(m)C_(m)u₁, it can be shown that under a condition of v_(t)being normal, w₁ is a vector that maximizes the following formula:

$\begin{matrix}\left\lbrack {{Math}.38} \right\rbrack &  \\{\sum\limits_{t = 0}^{m - 1}{{w\left\langle {w,v_{t}} \right\rangle_{k}}}_{k}^{2}} & \text{ }\end{matrix}$

where u_(t) represents a t-th column of U_(m). Also, v_(t) is expressedas follows:

$\begin{matrix}\left\lbrack {{Math}.39} \right\rbrack &  \\{v_{t} = {{\phi\left( x_{t} \right)} - {\sum\limits_{s = 0}^{m - 1}{\phi\left( x_{s} \right)}}}} & \text{ }\end{matrix}$

Therefore, it can be stated that w₁ is a vector that best approximatesthe residual on the RKHM, and this w₁ will be referred to as the firstprincipal vector. Similarly, w_(t)=Q_(m)C_(m)U_(t) will be referred toas a t-th principal vector. Denoting non-zero eigenvalues of Φ_(m)*Φ_(m)(a T×T matrix whose (s, t) component is k(x_(s), x_(t+1))∈A) as λ₀≥ . .. ≥λ₁>0, and the corresponding eigenvectors as v₀, . . . ,v₁, it can beshown w_(t)=λ_(t) ^(−1/2)Φ_(m)v_(t); therefore, calculation is carriedout in practice in this way. The proximity between data φ(x_(s)) and thet-th principal vector can be expressed as <w_(t),φ(x_(s))>_(k), andhence, <w_(t),φ(x_(s))>_(k) can be regarded as the t-th principalcomponent of φ(x_(s)). However, <w_(t),φ(x_(s))>_(k) takes a matrixvalue, and hence, instead of <w_(t),φ(x_(s))>_(k), for example, by using∥<w_(t),φ(x_(s))≥_(k)∥, a distribution of the data can be visualized.For example, visualization in the two-dimensional plane can be achievedby taking ∥<w₁,φ(x_(s))>_(k)∥ in the horizontal axis and∥<w_(t),φ(x_(s))>_(k)∥ in the vertical axis, and plotting the data.Also, by replacing φ(x_(s)) with the following formula,

$\begin{matrix}\left\lbrack {{Math}.40} \right\rbrack &  \\{{\phi\left( x_{s} \right)} - {\sum\limits_{t = 0}^{m - 1}{\phi\left( x_{t} \right)}}} & \text{ }\end{matrix}$

a centralized kernel PCA can be executed as in the case of generalkernel PCA using the RKHS.

<Evaluation>

Next, evaluation of the method according to the present embodiment willbe described.

<<Goodness of Prediction>>

A Kuramoto model on [0,2Π) shown in the following Formula (5) wasconsidered.

$\begin{matrix}\left\lbrack {{Math}.41} \right\rbrack &  \\{\frac{d\theta_{i}}{dt} = {\omega_{i} + {\frac{\kappa}{n}{\sum\limits_{j = 0}^{n - 1}{\sin\left( {\theta_{j} - \theta_{i}} \right)}}}}} & (5)\end{matrix}$

where θ_(i)(0) was assumed to be a random number following a uniformdistribution on [0, 2Π), and ω_(i) was also assumed to be a randomnumber following the uniform distribution on [0, 2Π).

A dynamical system shown in the following Formula (6) obtained bydiscretizing Formula (5) described above was considered.

$\begin{matrix}\left\lbrack {{Math}.42} \right\rbrack &  \\{x_{t,i} = {x_{{t - 1},i} + {\Delta t\omega_{i}} + {\Delta t\frac{\kappa}{n}{\sum\limits_{j = 0}^{n - 1}{\sin\left( {x_{{t - 1},j} - x_{{t - 1},i}} \right)}}}}} & (6)\end{matrix}$

Here, on [0, 2Π), the following function was considered,

{tilde over (k)}(x,y)=e ^(−|e) ^(ix) ^(−e) ^(iy) ^(|)  [Math. 43]

where the (i,j) component of k(x_(t),x_(s)) was set to˜k(x_(t,i),x_(s,j)), and Δt=0.01. Also, parameters were also set asn=200, T=10, and m_(t)=j_(t) upon normalization. At this time, forS=100, the magnitude |Q_(T)˜K_(T)Q_(T)*φ(x_(s-1))|_(k) of a predictedvalue was calculated in the cases of a parameter κ representing thestrength of interrelation set to κ=1, 10.

A result of plotting values of the respective components of|Q_(T)˜K_(T)Q_(T)*φ(x_(s-1))|_(K), in the case of κ=1 is illustrated inFIG. 4A. Also, a result of plotting values of the respective componentsof |Q_(T)˜K_(T)Q_(T)*φ(x_(s-1))|_(k) in the case of κ=10 is illustratedin FIG. 4B. Q_(T)-K_(T)Q_(T)*φ(x_(s-1)) is an approximation of φ(x_(s));therefore, the (i,j) component of |Q_(T)˜K_(T)Q_(T)*φ(x_(S-1))|_(k) isconsidered to be the (i,j) component of k(x_(s),x_(s)), i.e., anapproximation of ˜k(x_(s,i),x_(s,j)). Therefore, if x_(s,i) and x_(s,j)are closer to each other, the (i,j) components of|Q_(T)˜K_(T)Q_(T)*φ(x_(s-1))|_(k) should become greater, or if x_(s,i)and x_(s,j) are apart further, the (i,j) components of|Q_(T)˜K_(T)Q_(T)*φ(x_(s-1))|_(k) should become smaller.

In FIG. 4B, (i,j) components are uniformly greater compared to those inFIG. 4A (i.e., a greater value of K resulted in uniformly greater (i,j)components). Therefore, the value of each component of the predictedvalue at time S is aligned. In the Kuramoto model, as a certain lengthof time elapses, the greater K resulted in better aligned values of theelements; therefore, it can be understood that the approximation wasobtained precisely.

In fact, in the case of κ=1, 10, results of calculating k (x₁₀, x₁₀) andk (x₁₀₀, x₁₀₀) for x₁₀ and x₁₀₀, respectively, obtained directly fromFormula (6) described above are illustrated in FIGS. 5A to 5D. ComparingFIG. 4A with FIG. 5C, and FIG. 4B with FIG. 5D, respectively, it can beunderstood that close values are obtained. Also, comparing FIG. 4B withFIG. 5B and FIG. 5D, although at t=10, these are not yet completelysynchronized, by using ˜K_(T) approximated by using data up to t=10, itcan be understood the state of t=100 being sufficiently synchronized ispredicted.

<<Behavior of Proximity Between Elements when t→∞>>

A Kuramoto model on [0,2Π) shown in the following Formula (7) wasconsidered.

$\begin{matrix}\left\lbrack {{Math}.44} \right\rbrack &  \\{\frac{d\theta_{i}}{dt} = {\omega_{i} + {\frac{1}{n}{\sum\limits_{j = 0}^{n - 1}{\kappa_{i,j}{\sin\left( {\theta_{j} - \theta_{i}} \right)}}}}}} & (7)\end{matrix}$

where θ_(i)(0) was assumed to be a random number following a uniformdistribution on [0, 2Π), and ω_(i) was also assumed to be a randomnumber following the uniform distribution on [0, 2Π).

A dynamical system shown in the following Formula (8) obtained bydiscretizing Formula (7) described above was considered.

$\begin{matrix}\left\lbrack {{Math}.45} \right\rbrack &  \\{x_{t,i} = {x_{{t - 1},i} + {\Delta t\omega_{i}} + {\Delta t\frac{1}{n}{\sum\limits_{j = 0}^{n - 1}{\kappa_{i,j}{\sin\left( {x_{{t - 1},j} - x_{{t - 1},i}} \right)}}}}}} & (8)\end{matrix}$

Here, on [0, 2Π), the following function was considered,

{tilde over (k)}(x,y)=e ^(−|e) ^(ix) ^(−e) ^(iy) ^(|)  [Math. 46]

where the (i,j) component of k(x_(t),x_(s)) was set to˜k(x_(t,i),x_(s,j)), and Δt=0.01. Also, ˜K_(T) was calculated with n=50,T=10, and m_(t)=j_(t) upon normalization. Further, under each of thefollowing Setting 1 and Setting 2, Formula (4) described above wascalculated. Here, when calculating Formula (4) described above, zero wasassumed except for ˜λ_(m) close to 1.Setting 1: In the case of i>25 and j>25, κ_(i,j)=100; otherwiseκ_(i,j)=0Setting 2: In the case of (i<25 or i>35) and (j<25 or j>35),κ_(i,j)=100; otherwise κ_(i,j)=0

A result of calculating Formula (4) described above and plotting valuesof the respective components of the calculated result under Setting 1described above is illustrated in FIG. 6A. Also, a result of calculatingFormula (4) described above and plotting values of the respectivecomponents of the calculated result under Setting 2 described above isillustrated in FIG. 6B. Also, taking i in the vertical axis and j in thehorizontal axis, a result of plotting the magnitudes of κ_(i,j) underSetting 1 described above is illustrated in FIG. 7A. Similarly, a resultof plotting the magnitudes of κ_(i,j) under Setting 2 described above isillustrated in FIG. 7B. In the Kuramoto model, elements interacting eachother take closer values as time has elapsed sufficiently longer (i.e.,elements having large values of κ_(i,j)). Therefore, it can beunderstood that the behavior of the proximity when t→∞ can beapproximated.

<Hardware Configuration of Relationship Extraction Device 10>

Finally, a hardware configuration of the relationship extraction device10 according to the present embodiment will be described with referenceto FIG. 8 . FIG. 8 is a diagram illustrating an example of a hardwareconfiguration of the relationship extraction device 10 according to thepresent embodiment.

As illustrated in FIG. 8 , the relationship extraction device 10according to the present embodiment is implemented by a generic computeror computer system, and includes an input device 401, a display device402, an external I/F 403, a communication I/F 404, a processor 405, anda memory device 406. These hardware components are connected via a bus407 so as to be capable of communicating with each other.

The input device 401 is, for example, a keyboard, a mouse, a touchpanel, and the like. The display device 402 is, for example, a displayor the like. Note that the relationship extraction device 10 may or maynot have at least one of the input device 401 and the display device402.

The external the I/F 403 is an interface with an external device. Theexternal I/F 403 is an interface with an external device. The externaldevice includes a recording medium 403 a or the like. The relationshipextraction device 10 can execute read and write with the recordingmedium 403 a via the external I/F 403. The recording medium 403 a maystore, for example, one or more programs that implement the approximateoperator generation processing unit 100 and the relationship extractionprocessing unit 200.

Note that the recording medium 403 a includes, for example, CD(CompactDisc), DVD(Digital Versatile Disk), SD memory card (Secure Digitalmemory card), USB(Universal Serial Bus) memory card, and the like.

The communication I/F 404 is an interface for connecting therelationship extraction device 10 to a communication network. Note thatone or more programs that implements the approximate operator generationprocessing unit 100 and relationship extraction processing unit 200 maybe obtained (downloaded) from a predetermined server device or the likevia the communications I/F 404.

The processor 405 is any of various types of arithmetic/logic devices,for example, a CPU(Central Processing Unit), a GPU(Graphics ProcessingUnit), and the like. The approximate operator generation processing unit100 and the relationship extraction processing unit 200 are implementedby, for example, a process in which one or more programs stored in thememory device 406 causes the processor 405 to execute.

The memory device 406 is any of various types of storage devices suchas, for example, an HDD (Hard Disk Drive), SSD (Solid State Drive), RAM(Random Access Memory), ROM (Read-Only Memory), flash memory, and thelike. The storage unit 300 is implemented by, for example, the memorydevice 406. However, the storage unit 300 may be implemented by, forexample, a storage device connected to the relationship extractiondevice 10 through a communication network.

By having the hardware configuration illustrated in FIG. 8 , therelationship extraction device 10 according to the present embodimentcan implement the approximate operator generation process and therelationship extraction process described above. Note that the hardwareconfiguration illustrated in FIG. 8 is an example, and the relationshipextraction device 10 may have another hardware configuration. Forexample, the relationship extraction device 10 may have more than oneprocessors 405 or more than one memory devices 406.

The present invention is not limited to the embodiments described abovethat have been specifically disclosed, and various modifications,changes, combinations with known techniques, and the like can be madewithin a range not deviating from the description of the claims.

The present application is based on a base application No. 2020-035051filed in Japan on Mar. 2, 2020, the entire contents of which are hereby

INCORPORATED BY REFERENCE List of Reference Numerals

-   10 relationship extraction device-   100 approximate operator generation processing unit-   101 obtaining unit-   102 approximate operator generation unit-   200 relationship extraction processing unit-   201 obtaining unit-   202 relationship extraction unit-   300 storage unit-   401 input device-   402 display device-   403 external I/F-   403 a recording media-   404 communication I/F-   405 processor-   406 memory device-   407 bus

1. A relationship extraction device comprising: a memory; and aprocessor configured to execute a obtaining a set of data {x₀, . . . ,x_(T−1)}□X each having multiple elements and a set of data {y₀=f(x₀), .. . , y_(T−1)=f(x_(T−1))}□Y each having multiple elements, where f isany mapping; generating an approximate operator that approximates aPerron-Frobenius operator K satisfying Kφ₁(x_(t))=φ₂(y_(t)) for t=0, . .. , T−1, wherein φ₁ is a feature mapping with respect to a positivedefinite kernel function k₁ on X×X that takes C*-algebra values, and φ₂is a feature mapping with respect to a positive definite kernel functionk₂ on Y×Y that takes C*-algebra values; obtaining data x_(t) and x_(s)as targets of relationship extraction; and extracting a relationshipbetween each element of x_(t) and each element of x_(s) by using theapproximate operator.
 2. The relationship extraction device as claimedin claim 1, wherein the extracting extracts the relationship foranalyzing anomaly detection or causal estimation.
 3. The relationshipextraction device as claimed in claim 1, wherein the extracting extractsa C*-algebra value representing the relationship, by approximating aninner product <x_(t), x_(s)>_(k) defined on an RKHM (reproducing kernelHilbert C*-module) with respect to the positive definite kernel functionk₁, by the approximate operator.
 4. The relationship extraction deviceas claimed in claim 1, wherein in a case of X=Y, y_(t)=x_(t+1), k₁=k₂=k,and φ₁=φ₂=φ, the generating generates the approximate operator by usingan operator {circumflex over ( )}K with which {circumflex over( )}Kφ(x_(T−1)) approximates φ(x_(T)), and an orthonormal projectionfrom an RKHM with respect to a positive definite kernel function K to aspace represented by a linear combination of φ(x_(t)) and C*-algebravalues.
 5. The relationship extraction device as claimed in claim 1,wherein in a case of X≠Y, the generating generates the approximateoperator by using a linear mapping {circumflex over ( )}K with which{circumflex over ( )}Kφ₁(x_(t)) approximates φ₂(y_(t)), and anorthonormal projection from an RKHM with respect to the positivedefinite kernel function k₁ to a space represented by a linearcombination of φ₁(x_(t)) and C*-algebra values.
 6. A method ofextracting relationship executed by a computer including a memory and aprocessor, the method comprising: obtaining a set of data {x₀, . . . ,x_(T−1)}□X each having multiple elements and a set of data {y₀=f(x₀), .. . , y_(T−1)=f(x_(T−1))}□Y each having multiple elements, where f isany mapping; generating an approximate operator that approximates aPerron-Frobenius operator K satisfying Kφ₁(x_(t))=φ₂(y_(t)) for t=0, . .. , T−1, wherein (pi is a feature mapping with respect to a positivedefinite kernel function k₁ on X×X that takes C*-algebra values, and φ₂is a feature mapping with respect to a positive definite kernel functionk₂ on Y×Y that takes C*-algebra values; obtaining data x_(t) and x_(s)as targets of relationship extraction; and extracting a relationshipbetween each element of x_(t) and each element of x_(s) by using theapproximate operator.
 7. A non-transitory computer-readable recordingmedium having computer-readable instructions stored thereon, which whenexecuted, cause a computer to function as the relationship extractiondevice as claimed in claim 1.